Efficient algorithms to compute compressed longest common substrings and compressed palindromes
نویسندگان
چکیده
This paper studies two problems on compressed strings described in terms of straight line programs (SLPs). One is to compute the length of the longest common substring of two given SLP-compressed strings, and the other is to compute all palindromes of a given SLPcompressed string. In order to solve these problems efficiently (in polynomial time w.r.t. the compressed size) decompression is never feasible, since the decompressed size can be exponentially large. We develop combinatorial algorithms that solve these problems in O(n4 log n) time with O(n3) space, and in O(n4) time with O(n2) space, respectively, where n is the size of the input SLP-compressed strings. © 2008 Elsevier B.V. All rights reserved.
منابع مشابه
Repeats and Palindromes: an Overview
With a long text string like DNA, repeats and palindromes are not easily spotted. Yet nding such substrings is important; for instance, repeats in DNA are indicators of certain hereditary disorders and are used as genetic markers. We discuss repeats and then palindromes and then we relate the two. In our discussion of repeats, we rst de ne an exact repeat and then ve de nitions of approximate r...
متن کاملComputing Longest Common Substring and All Palindromes from Compressed Strings
This paper studies two problems on compressed strings described in terms of straight line programs (SLPs). One is to compute the length of the longest common substring of two given SLP-compressed strings, and the other is to compute all palindromes of a given SLPcompressed string. In order to solve these problems efficiently (in polynomial time w.r.t. the compressed size) decompression is never...
متن کاملFinding Characteristic Substrings from Compressed Texts
Text mining from large scaled data is of great importance in computer science. In this paper, we consider fundamental problems on text mining from compressed strings, i.e., computing a longest repeating substring, longest non-overlapping repeating substring, most frequent substring, and most frequent non-overlapping substring from a given compressed string. Also, we tackle the following novel p...
متن کاملSuffix Trees and Suffix Arrays
Iowa State University 1.1 Basic Definitions and Properties . . . . . . . . . . . . . . . . . . . . 1-1 1.2 Linear Time Construction Algorithms . . . . . . . . . . . . . 1-4 Suffix Trees vs. Suffix Arrays • Linear Time Construction of Suffix Trees • Linear Time Construction of Suffix Arrays • Space Issues 1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
متن کاملGeneralized Substring Compression
In substring compression one is given a text to preprocess so that, upon request, a compressed substring is returned. Generalized substring compression is the same with the following twist. The queries contain an additional context substring (or a collection of context substrings) and the answers are the substring in compressed format, where the context substring is used to make the compression...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Theor. Comput. Sci.
دوره 410 شماره
صفحات -
تاریخ انتشار 2009